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The  relationship  between  the  (generalized)  mean 
Kullback-Leibler 's  information  and  the  (generalized)  maximum 
likelihood  principle  is  exploited  in  this  report  to  analyze  the 
state  estimation  problems  of  both  discrete-time  and 
continuous-time  uncertain  non-linear  systems. 
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INTRODUCTION 


In  solving  practical  state  estimation  problems,  we  often 
encounter  two  difficult  questions.  The  first  question  is  related 
to  the  accuracy  of  the  deterministic  model  used  for  fitting  the 
dynamics  of  measurements.  The  second  question  concerns  the 
accuracy  of  the  statistical  model  of  measurement  errors.  Since 
the  exact  mathematical  representation  of  the  physical  measurement 
process  is  not  known,  a  conservative  but  prejudiced  approach  to 
resolve  the  above  two  questions  is  to  adjust  parameters  in  the 
model  until  measurement  residuals  are  acceptable.  (This  approach 
is  prejudiced  because  almost  all  anomalies  in  the  residuals  can 
be  made  to  disappear  by  carefully  adjusting  parameters  in  the 
model.)  The  resulting  state  estimates,  therefore,  differ 
significantly  for  different  practitioners  who  depend  heavily  on 
models  and  personal  experience  in  residual  analysis.  Moreover, 
the  accuracy  of  the  dynamical  model  and  the  statistical  behavior 
of  the  measurement  process  are  two  compromising  quantities 
especially  well-known  to  those  who  use  Kalman  filters 
extensively.  For  a  given  measurement  accuracy,  it  was  observed 
in  [1]  and  (2]  that  a  Kalman  filter  might  diverge  due  to  the 
inaccuracy  in  the  dynamical  model.  Adding  feo-called  "process 
noise"  to  the  dynamical  model  may  prevent  filter  divergence 
[1,3,4].  A  detailed  analysis  of  filter  divergence  for  a 
time-invariant  linear  system  is  documented  in  [1],  In  regard  to 
the  selection  of  the  covariance  matrix  of  the  process  noise, 
people  in  the  field  often  admit  that  it  is  more  an  art  than  a 
science. 

Akaike  [5]  applied  the  mean  Kullback-Leibler 's  information 
(MKLI)  [6]  to  extend  the  maximum  likelihood  principle.  Perhaps 
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the  most  astonishing  result  of  [5]  in  terms  of  the  impact  on  time 
series  analysis  is  that  a  computable  quantity  called  model 
unreliability  is  introduced  and  applied  to  some  practical 
problems.  The  combination  of  model  unreliability  and  badness  of 
fit  was  used  in  [5]  and  [7]  as  a  measure  to  select  parameters  in 
a  model  for  a  stationary,  ergodic  process.  The  same  idea  was 
extended  recently  in  [8]  to  determine  the  order  of  a  linear 
time-varying  auto-regressive  model. 

In  this  report,  we  follow  the  reasoning  in  [6]  and  [8]  to 
address  when  to  terminate  adjusting  parameters  in  a  non-linear 
system  and  how  to  select  the  best  non-linear  state  estimate  among 
many  candidates.  However,  only  the  asymptotic  result  is 
obtained.  Further  studies  are  required  to  extend  the  result 
reported  herein  to  cover  the  finite  sample  cases. 

The  structure  of  this  report  is  summarized  in  Figures  1  and 
2.  Hopefully,  these  figures  can  also  be  thought  as  the  logic 
tree  that  describes  the  linkage  of  many  small  pieces  throughout 
the  report. 
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CONNECTED  BY 
ASYMPTOTIC  THEORIES 
(Theorems  3.1,  3.2,  and  3.3) 


MKLI 

•  PROVIDE  A  MEASURE  OF  DISTANCE 
BETWEEN  THE  ASSUMED  MODEL  AND 
THE  TRUTH  (§3.A,  3.B) 


•  IT  IS  NOT  COMPUTABLE  BECAUSE  THE 
TRUTH  IS  NOT  KNOWN 


I 


MLE 

•  COMPUTABLE 
( §  3.A,  3.B) 


|1347«5-N| 


Fig.  1.  The  structrue  of  the  report:  fixed  dynamic 
models. 


CONNECTED  BY 
ASYMPTOTIC  THEORIES 
(Theorems  3.5  and  3.6) 


GENERLIZED  MKLI 
•  A  MEASURE  OF  DISTANCE  BETWEEN 
THE  MODEL  AND  THE  TRUTH 
(§3.C) 


•  NOT  COMPUTABLE 


1 ' 


•* 


GENERLIZED 

MLE 

•  COMPUTABLE 
(§  3.C) 


||34746-N | 


Fig.  2.  The  structure  of  the  report:  tunable  dynamic 
models. 
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2.  NOTATIONS  AND  PRELIMINARIES 

In  this  section,  we  formulate  the  general  problem  to  be 
addressed.  Let  £(t)  be  an  m-dimensional  vector  measurement 
process.  The  true  representation  of  z^t)  is  assumed  to  be  given 
by 

z(t)  *  ho(xo(0)  ,x0d) , . . .  ,x0(t )  ;t )  +  nQ(t)  (2.1) 

where  hQ  is  an  m-dimensional  single-valued  function 
differentiable  with  respect  to  the  arguments,  and  n0(t)  is 
m-dimensional,  zero-mean  white  Gaussian  noise  with  a 
positive-definite  covariance  matrix  R0  denoted  by  R0  >  0. 
Throughout  the  report,  a  subscript  "o"  refers  to  the  true  model. 
Note  that  the  probability  density  function  of  z( 0) ,..,£( t^ ) , 
denoted  by  PQ  is  well  defined  and  uniquely  determined  by 
System  (2.1). 

A  mathematical  model  different  from  (2.1)  is  generally  used. 
Let  the  mathematical  model  be  given  by 
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initial  condition  x(0) 


(2.2a) 


x(t+l)  *  f(x(t),t) 

z(t)  =  h(x( t )  ,t)  +  n ( t )  (2.2b) 

where  x  is  a  q-dimensional  vector  and  n(t)  is  a  zero-mean  white 
Gaussian  noise  process  with  a  positive-definite  covariance  matrix 
R.  It  is  also  assumed  that  £  and  h  possess  the  same  analytic 
properties  as  hQ.  The  probability  density  function  induced  by 
(2.2)  is  denoted  by  P.  Furthermore,  we  assume 


h(jxo(0,t), t)  =  h(xo(0,t),t) 


(2.3) 


where  j  is  a  function  that  maps  the  nt-dimensional  Euclidean 
space  to  the  q-dimensional  Euclidean  space  and 
x,o(0,t)  *  (Xq(  0 ) ,  •  •  •  ,jcq(  t ) ) . 

Many  practical  problems  can  be  formulated  by  (2.2)  for 
estimating  the  initial  state  x(0)  from  measurements.  Equation 
(2.2a)  describes  the  physical  law  governing  the  state  vector, 
whereas  in  (2.2b)  models  the  measurement  function.  In  reality, 
the  exact  physical  law  is  either  not  known  completely  or  is  too 
complicated  to  be  applied  directly.  On  the  other  hand,  the 
functional  relationship  between  a  given  state  vector  and  the 
deterministic  measurements  is  usually  known.  However,  exact 
statistical  properties  of  measurement  noise  n(t)  are  seldom 
known.  The  trajectory  estimation  problem  is  a  typical  example 


that  fits  the  above  description  exactly.  The  ballistic 
trajectory  of  an  object  is  governed  by  Newton  and  Euler 
equations.  An  important  part  of  the  driving  forces  and  torques 
in  Newton  and  Euler  equations  is  due  to  air  pressure.  In 
aerodynamics,  pressure  is  best  modeled  by  a  potential  equation 
which  describes  the  velocity  field  of  the  air.  It  is  impossible 
with  current  technology  to  incorporate  a  potential  equation  with 
Newton  and  Euler  equations  into  the  framework  of  the  trajectory 
estimation  problem.  On  the  other  hand,  what  a  radar  can  measure 
about  the  target  motion  is  modeled  by  (2.3). 

The  solution  of  the  non-linear  difference  equation  (2.2a)  is 
unique  and  denoted  by  x(t;x(0)).  The  Jacobian  matrices  F(t)  and 
H(t)  are  defined  by 


F(t) 


3X(x.t) 


3x 


x 


x(t;x(0)) 


(2.4) 


H(t) 


3h(x»t) 


x(t;x(0)) 


(2.5) 


The  transition  matrix  of  F(t)  denoted  by  <|>(t,  t)  satisfies  the 
following  difference  equation 


<fr(t+l,  t)  =  F(t)  <|>(t,  t)  ;  $(  t,  t) 


I 


(2.6) 


where  I  is  the  q  x  q  identity  matrix.  Furthermore,  we  define  the 
R-observability  Gramian  M(2«(0);ti)  by 

M(x(0)  ;t. )  -  Z  4.T(t,0)HT(t)R-1H(t)c()(t,0)  (2.7) 

T»0 
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where  superscripts  "T"  and  "-1"  denote  matrix  transpose  s  ' 
inverse  respectively.  The  meaning  of  the  observability  1  imian 
with  respect  to  observability  and  unbiased  estimation  of  ^.tem 
(2.2)  is  described  in  [10]. 

In  the  first  part  of  this  report,  we  address  the  problem  of 
estimating  x(t)  from  the  observed  sample  for  two  different 
situations.  When  the  form  of  in  (2.2a)  is  fixed  (except  for 
the  initial  condition  x(0)),  we  shall  classify  this  case  as  a 
fixed  dynamical  model.  We  shall  call  the  other  case  a  tunable 
dynamical  model  when  the  functional  form  of  f  is  not  fixed. 

3.  DISCRETE-TIME  UNCERTAIN  NON-LINEAR  SYSTEMS 

A.  MKLI  and  MLE 

The  mean  Kullback-Leibler 's  information  (MKLI)  is  a  function 
of  the  likelihood  ratio  which  gives  a  measure  of  separation 
between  two  probability  distributions.  The  normalized  MKLI  of 
(2.1)  and  (2. 2)  is  given  by 
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where 


L(x(0),R,ti)  =  in  R  +  d  (3C  (  0 ) ,  R ,  tx  ) 


(3.2) 


and  d  is  defined  by  the  following  two  equations: 


d(x(0) .R.tj) 


Z  ! |2(t)-h(x(t;x(0)).t|  |  . 

t-0  R 


(3.3  ) 


dUW.R.t^  -  Eod(x(0),R,t1) 


-  Tr (R  R-1) 
o 


Z  | | h  (x  (0,t),t)-h(x(t;x(0),t) 
— o  — o  - — 

t=0 


(3.4) 


Note  that  E0  is  the  expectation  operator  with  respect  to  the 
true  probability  density  function  PG,  | . j  denotes  the 
determinant  of  the  enclosed  matrix,  "Tr"  denotes  the  trace  of  a 
matrix,  and  |  j  .  |  |  denotes  the  Euclidean  norm.  A  smaller  value  of 
W(ti)  means  that  the  corresponding  Model  (2.2)  is  closer  to 
the  truth  in  the  sense  of  MKLI. 


8 


The  maximum  likelihood  estimate  (MLE)  of  (x(0),R)  denoted  by 

/V  A 

(x(0/ti),  R(ti))  is  defined  to  be  the  minimum  point  of 

J(x(0),R,ti)  =  An | R |  +  d(x( 0) ,R,t! )  (3.5) 

It  is  easy  to  verify  that 

L(x(0),R,t!)  =  EoJ(x(0),R,ti)  (3.6) 

Note  that  the  second  term  of  (3.1)  is  a  constant  for  a  given 
observed  sample  and  independent  of  the  assumed  mathematical 
model.  Because  of  this  fact  and  (3.6),  it  is  not  surprising  to 
see  that  the  MKLI  and  the  MLE  have  very  close  relationships. 
Indeed,  they  are  shown  to  be  equivalent  with  probability  one  with 
respect  to  the  true  probability  density  function  (w.p.l,  Pc) 
asymptotically  if  the  limit  of  L  with  respect  to  t^  is 
unimodal  [8] . 

B.  FIXED  DYNAMICAL  MODELS 

We  shall  first  establish  the  unimodal  condition  of  L 
(x(0),R,t]_)  for  any  finite  t^  and  then  state  the  equivalent 
relationship  between  the  MLE  and  the  MKLI  for  the  model  given  by 
(2.2). 

It  requires  three  steps  to  establish  the  unimodal  condition 
of  L( x ( 0 ) , R, t^ ) .  The  first  step  is  summarized  by  the  following 
theorem. 
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Theorem  3.1  For  a  given  R  >  0,  we  hypothesize  that 


(i)  x(0)eS,  where  S  is  a  convex  and  compact  subset  of  R9 

(ii)  the  observability  Gramian  M(x(0);t])  >  0  for  all 
x(  0)  eS 

(iii)  for  any  b^  ,  b2 eS  that  minimize  d  (x(0),R,ti) 
and  cftJeRQ  for  each  t,  there  exists  bj  in  S 
such  that 

V1  2  tl"1  2 

l  ||2h(c(t»  -  (h(b  )  +  h(b.))||  ,>4  I  |Jh(c(t))  -  h(b.)||  . 

t-0  1  1  R  1  t=0  5  R*1 


Under  the  above  three  hypotheses,  there  exists  a  unique  minimum 
point  of  d(x(0) ,R,ti )  in  S. 

Proof  The  existence  of  a  minimum  point  in  S  is  guaranteed  by 

the  hypothesis  that  d  is  a  continuous  function  defined  over  a 
compact  set  S.  Let  b^  and  b2  be  two  minimum  points  of  d  in 
S .  Let 


hi  *  h(x(t;bi),t)  for  i  =  1,2. 
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v  o\'V\v  o' •  y 


'oV<-.v. 


By  the  parallelogram  law,  we  have 

2  2 

2Ulv*ill  1 


By  Hypotheses  (i),  (ii)  and  by  (2.3),  we  have 


bg  *  j  (bj+b^eS,  and 


V1 


E  |  1 2h  (^Xq  (0,  t) )  -  (h+h)|  |  >4  E  ||h(<fex  (0,t))-h(b.) 

t=0  ^  1  t=0 


for  all  xo(0,t)eR  .  Let  c(t)  *  j(xo(0,t)).  By  the  fact 
that  both  bj  and  b2  are  minimum  points  of  d  and  by  (3.7)  as 
well  as  (3.8),  we  have 


2  V1 


^  IIV^H  ,  “  4  L  Hb(£(t))  -  hjl  ^  -  E^  ||2h(c(t))  -  (hj+h^l 


Equation  (3.9)  implies  that 


V1  2 

1  Ilhi-hJI  -0  (3.10) 

t-0  R-i 

By  Hypothesis  (ii)  and  Corollary  2.1.1  of  (10],  (3.10) 
implies 


bl  =  b2 


Q .  E  .  D . 


Now  let  S  be  a  set  of  initial  states  for  Model  (2.2).  For  a 
fixed  vector  in  S,  it  is  known  (for  example,  use  the 
technique  introduced  in  [11] )  that  there  exists  a  unique 
covariance  matrix  given  by 


R1  =  Ro  +  X(blft!)  (3.11) 


which  minimizes  L(x(0),R,t^)  where  X(fc>i,ti)  is  defined  by 

tl-^ 

I  (h  -h(x(t;b.),t))(h  -h(x(t;b.),t))T 
—1  1  t,  _  _  — o - —1  — o - -l 


Let  C  be  a  subset  of  m  x  m  positive-definite  matrices  containing 
R0  and  those  generated  by  S  through  (3.11)  and  (3.12). 

Finally,  the  existence  of  a  unique  model  is  summarized  by  the 
following  theorem. 


Theorem  3.2  Let  S  be  a  convex  and  compact  set  of  initial 
states,  and  C  be  the  partially-ordered  set  of  positive-definite 
matrices  defined  above.  If  Hypotheses  (ii)  and  (iii)  of  Theorem 
3.1  hold  for  all  R  in  C  then  there  exists  a  unique  point  in  S  x  C 
which  minimizes  L(x( 0) ,R,ti ) . 


Proof  By  Theorem  3.1,  there  exists  a  unique  vector  b  in  S 
which  minimizes  d{x(0),R,tj)  for  a  given  R  in  C.  We  can 
construct  a  sequence  in  S  x  C  as  follows: 

(1)  Let  bi  be  the  unique  vector  in  S  which  minimizes 
d(x( 0) ,R0,ti )  and 

Rl  -  R0  +  X(bi,t!) 

(2)  bfc  is  defined  to  be  the  unique  vector  in  S  which 
minimizes  d(x( 0) ,Rk-1 »ti )  and 

Rk  =  Ro  "*■  k,  ,  t  ) 

Since  X(b,ti)  is  continuous  over  a  compact  S,  S  x  C  is  also 
compact.  There  exists  a  limit  point  of  (b^R^)  denoted  by 

A  A 

(b,R)  in  the  compact  set  S  x  C.  We  shall  prove  that  the  limit 
point  is  unique. 
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By  construction,  we  have 


^Jilk+l  '  ^1 )  <_  d  (jDfc  ,R^ ,  tj  )  (3.13) 


Therefore,  by  definition  and  (3.13),  we  have 


k^i^k+l  r^k+1  **-1 )  i  L(  bj^+j  ,R|^ ,  tj  ) 


Hence,  r )  }  is  non-increasing  and  bounded  below  in 

R1 .  Thus,  the  limit  exists  and  is  denoted  by  L„.  Let 
{(Jbk«,Rk')  }  be  a  subsequence  such  that 


lim  (bk«,Rk«)  =  (b,R) 
k'-*-  <*> 


Since  L(b,R,ti)  is  a  continuous  function  defined  over  S  x  C 
have 


lim  L ( b^  i  , Rj^  i  f  ti )  =  L(  b,  R,  ti  ) 


Due  to  the  unique  property  of  a  convergent  sequence,  we  have 


L(  b,R, t^ )  -  L oo 


Suppose  that  (b].,Ri)  is  a  minimum  point  of  L  in  S  x  C.  If 
bi=b  then  (3.11)  implies  that  R=Ri .  If  bi  *  b,  by 
Hypotheses  (ii)  and  (iii)  of  Theorem  3.1  for  all  R  in  C,  we 
should  have  either  X(b,t^)  >  X(bi,ti)  or  X(b,ti>  < 

X(bi#ti).  Assuming  X(b, ti )  >  X(b].,ti),  we  have 


d(bi,R,ti)  <  d (b, R, ti ) 


(3.14) 


Inequality  (3.14)  is  a  contradiction  because  b  minimizes 
cT(x(  0)  ,R,  tx  ) .  On  the  other  hand,  if  we  have 


d(bfRi,ti)  <  d(bi,Rlfti) 


then 


L(b,Ri,tx)  <  L(bi,Ri,ti) 


(3.15) 


Again,  it  contradicts  the  assumption  that  (bi»Ri)  is  a 

A  A 

minimum  point  of  L.  Therefore,  bi=b  and  Ri=R. 
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The  equivalent  relationship  between  the  MLE  and  the  MKLI  is 
established  by  two  steps.  For  the  first  step,  we  shall  assume 
that  Theorem  3.2  holds  asymptotically  as  t^  approaches 

A  /\ 

infinity,  and  let  (b,R)  be  the  unique  minimum  point  of 
L^(x(0),R)  in  S  x  C,  where  Li()((0),R)  is  the  limiting 
function  of  L(  x  (  0 ) ,  R ,  t^  ) .  By  Theorem  1  of  18),  we  have 


1 im( x( O/ti ) ,  R( tj ) )  =  (b,R),  w.p.l.  PQ  (3.16) 

t,  •+  00 

Note  that  (x(0/t1),R(t1))  is  the  MLE  of  (x(0),R). 

It  is  proved  in  16]  that  the  MKLI  defined  by  (3.1)  is  a 
non-negative  quantity  and  is  equal  to  zero  if  and  only  if 
PCZq*1  )  =  P0 ( Z o  C 1 )  w.p.l.  PG.  This  property  is  a 

basis  for  MKLI  to  provide  a  measure  of  distance  between  the  truth 
(2.1)  and  Model  (2.2).  For  the  second  step  of  establishing  the 
equivalence  between  the  MLE  and  the  MKLI,  we  shall  prove  that 
this  important  property  is  preserved  when  (x(0),R)  is  replaced  by 
the  MLE  (x(0/ti ) ,R( t^ ) )  in  the  definition  of  MKLI.  For  this 
purpose,  we  have  the  following  theorem. 

Theorem  3.3  Let  S  x  C  be  the  same  set  defined  in  Theorem  3.2 
such  that  (3.16)  holds  for  all  (x(0),R)eS  x  C  and  (b,R)  is  the 
unique  minimum  point  of  L(x(0),R)  in  S  x  C.  Let 

A  A 

(  x(  0/ti )  ,R(  tj  ) )  be  the  MLE  which  minimizes  J  (jt  (  0 )  ,  R ,  tj  ) ) 
over  S  x  C.  Then, 


A 

lim  E0  J (x( 0/ti ) , 


t 


1 


-►  oo 


R( ti ) ,tx  ) 


A  A 

L(b,R) 


(3.17) 
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Proof  By  the  definition  of  d(x( 0) ,R, tj ) ,  we  have 


EqJ2(x(0)  ,R,tx)  <  2  [<  Jtn  |  R  |  )  2  +  u(x(  0)  ,R,  tx  )  ] 


where 


(3.18) 


UOciOJ.R.tj) 


\e0C  E  I  li<t) 

t,  t-0 


-  h(x(t;x(0),t 


(3 .19) 


By  the  hypothesis  that  L(x(0),R,ti)  converges  for  all 
<x(0),R)cS  x  C,  we  have  for  sufficiently  large  tj 


—  11  ^ 

-  (Tr  RQR  )  +  -=j  (  1  N]}0“hM  > 


t-0 


(3.20) 


+  0<  fr> 

C1 


where  we  have 


lim  0(~)  -  0 

t^co  *1 
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By  ( 3 .18 )-( 3 .20  ) ,  EDJ22  ( sc(  0 )  ,  P  ,  )  is  uniformly  bounded 

in  tj  for  all  (x(  0)  ,R)  eS  x  C.  Since  (x(0/tj  )  fR(  t]_ ) )  eS  x  C, 

a  A 

J ( x( 0/tj ) ,R( tj ) )  is  uniformly  integrable  for  all  tj .  By 
the  continuity  assumption  of  J  over  S  x  C  and  the  uniformly 
integrable  theorem,  we  have 


lim  E0  J(x(0/ti) ,R(ti) ,ti) 


t  -*-00 

1 


=  E0  lim  J(x( O/t^ ) ,R( tj ) ,tj ) 


=  L(b,R) 


O.E.D, 


Theorem  3.3  assures  that  the  basis  for  MKLI  to  be  an 
information  measure  is  preserved  asymptotically  if  (x(0),R)  is 

A 

replaced  by  (x( 0/tj ) ,R( tj ) )  because  the  second  term  in  (3.1) 
is  independent  of  the  assumed  model.  We  shall  call  it  the 
generalized  mean  Kullback-Leibler 1 s  information  (GMKLI)  if  the 
estimate  is  used  to  replace  (x(0),R)  in  the  definition  (3.1). 
The  idea  of  GMKLI  is  applied  in  the  next  few  subsections  for 
tuning  process  noise. 

C.  TUNABLE  DYNAMICAL  MDDELS 

t 

For  a  given  sample  function  Zq  ,  an  extended  Kalman 
filter  can  be  constructed  based  on  the  mathematical  model  given 

/v 

by  (2.2).  The  predicted  estimate  of  x(t)  denoted  by  x(t)  is 
derived  by 


x ( t )  =  F( t-1 ) x ( t-1 )  +  G(t-l)  v(t-l) 


( 1.21a  ) 
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(3.21b) 


v(t-l)  =  z(t-l)  -  H(t-l)  x(t-l) 


where  F(t-l)  and  H(t-l)  are  the  Jacobian  matrices  of 

f^x(  t-1 ) , t-1 )  and  h(  jc ( t-1 )  ,t-l )  evaluated  at  the  updated  estimate 

of  jcft-l).  The  matrix  G(t)  in  (3.21a)  is  given  by 


G(t)  =  F(t)  K(t)  (3.21c) 


K(t)  =  I(t)HT{t)  (H(t)  E(t)HT(t)  +  R]  “1  (3.21d) 


E(t)  »  F(t-l) E+(t-l)FT( t-1)  +  0  (3.21e) 


E+(t)  =  (I-K(t)H(t)]  E(t) 


(  3 . 21 f  ) 


where  0  is  a  non-negative  definite  matrix  which  is  often  called 
the  covariance  matrix  of  process  noise  that  models  the  mismatch 

A 

of  Model  (2.2).  The  state  estimate  j<(t)  can  be  computed 

A 

recursively  for  all  t,  0  <  t  <  ,  if  x(0) ,  E(0)r  R,  and  0  are 

specified.  Let  J)  denote  the  totality  of  all  parameters  for 

rs 

specifying  x(0)  ,  £(0),  Q;  x(t,6^,R)  denotes  the  dependence  of 
the  state  estimate  on  the  parameter  Q_  and  R.  The  dynamical  Model 
(2.2a)  is  transformed  into  (3.21a)  and  (3.21b)  which  certainly 
are  tunable.  We  shall  address  how  to  tune  the  model  from  the 
obse  rved  samp le . 
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GMKLI  AND  GMLE 


The  G^KLI  of  (3.21)  with  respect  to  (2.1)  is  defined  by 


W(ti)  *  L(_0,R,t!)  -  L0  ( 1 1  )  (3.22a) 


where 


L(_9,R,t1 ) 


V1 


in  j  R  1  +  —  Z  E0|  |z(t)-h(x(t,  0,R)  ,t 
II  ci  t*0  1 


-1 


( 3.22b) 


L0(ti) 


E0(  in  Po(Z0  1  )  ) 

C1 


(3.22c) 


=  in  Rc  +  n 


We  shall  first  show  that  (3.22)  indeed  defines  an  information 
measure. 


Theorem  3.4  If  R  >  0  then 


W(ti)  >  0 


and  the  equality  holds  if  and  only  if  (Jj,R)  minimizes  L  and 


E0||h0-!><£<t.j,R.t)||2  * 

R  1 

for  all  0  <_  t  <_  tj  . 

a 

Proof  By  recognizing  that  ri0(t)  is  orthogonal  to  2S(t;_0,R), 

it  is  not  difficult  to  show  that 


A 

L(_0,R,  tj )  =  *n|R|  +  Tr  <RoR_1> 

‘r1 

+  T-  1  Eo|  |h0-h(x(tjJ,R)  ,t|  I  2 
tl  t“0  1  'R-l 

Since  L0(ti)  is  the  unique  minimum  for  An | R J  +  Tr  R0R“1, 
we  complete  the  proof  of  this  theorem. 

Note  that  L0(ti)  is  independent  of  J  and  R;  therefore, 
we  do  not  need  to  know  the  exact  value  of  Lc(ti)  for  the 
purpose  of  estimate  comparison.  The  state  estimate  which  yields 
the  smallest  L(_0,R,ti)  defined  by  (3.22b)  is  considered  as  the 
best  estimate  in  the  sense  of  the  GMKLI .  In  practical 

A 

applications,  however,  L(_e,R,ti)  cannot  be  computed  directly 
because  PQ  is  not  known.  To  circumvent  this  problem,  we  shall 
establish  the  equivalence  relationship  between  the  GMKLI  and  the 
generalized  maximum  likelihood  estimate  ( GMLE )  which  is 
computable  from  the  observed  sample  and  the  assumed  model.  The 


GMLE  denoted  by  (Jj(  t^  )  ,  R(ti))of  (_0,R)  is  defined  to  be  the 
minimum  point  of  the  following  function 


V1 


dfe.R.ti)  =  in  I  R  j  +  —  I  I  I  z  ( t ) -h  (x(t;Q,R)  ,t)  )  |  |  ,  (3.23) 

11  1  t«0  1  1  ”  R_i 


As  in  the  case  of  a  fixed  dynamical  model,  we  shall  first  study 

the  unimodal  < 

A 

b  =  ( e,R)  and 


the  unimodal  conditions  of  L(J,R,ti)  for  a  finite  t\ .  Let 


A  3x(t;6,R) 
*(t)  - 


3b 


(3.24) 


The  generalized  R-observability  Gramian  is  defined  by 
A  tl_1 

M(b;t!)  =  E  i|>T(t)  »T(t)  R"1  H(t)  *(  t )  (3.25) 


We  have  the  following  theorem. 


Theorem  3.5  Let  T  be  a  compact  and  convex  set  containing 
elements  of  x  (  0  ) ,  E(  0 ) ,  0,  and  R  defined  by  (3.21)  such  that  E  (  0 ) 
and  0  are  pos it ive-semidef inite  and  R  is  positive-definite. 
Furthermore,  we  hypothesize  that 
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(i)  E0M{b;ti)  >  0  for  all  b  e  T 

(ii)  for  any  b^ ,  £2  e  T  and  c(t)e  R<3  for  each  t, 

we  have  the  same  condition  as  (iii)  of  Theorem  3.1, 

Under  the  above  hypotheses,  there  exists  a  unique  minimum  point 

A 

of  L( J), T , 1 1  )  in  T. 

The  proof  of  this  theorem  can  be  carried  out  by  the  same  way 
done  in  Theorem  3.1.  The  equivalent  relationship  between  the 
GMLE  and  the  GMKLI  can  be  established  similarly  as  introduced  in 
Section  3 .R 

When  hypotheses  of  Theorem  3.5  are  too  difficult  to 
examine,  the  equivalence  between  GMLE  and  GMKLI  can  be  studied  as 
follows.  First,  we  observe  that  Lc(ti)  in  the  definition  of 
GMKLI  (Eq.  3.22a)  is  independent  of  the  assumed  mathematical 
model.  The  equivalent  relationship  will  be  established  if 

A  A 

L(_0,R,tj)  can  be  approximated  by  d(_0,R,ti>  (see  Eq.  3.23). 

The  following  theorem  provides  conditions  that  the  above  two 
quantities  coincide  asymptotically. 

Theorem  3.6  Under  the  hypotheses  that 

(i)  h  is  uniformly  bounded  in  both  jc  and  t 
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24 


/w 


:a)  lira  i-  E  I  1 nQ ( t ) I  I  2  =  Tr  R0R“1 
t1  -  »  t=0  1  1  " 


w .  p .  1  Pc 


v1 


(b)  lim  E  noT(t)R-1(ho-h(x(t;_0,R)  ,t)  )  = 

i.  j _ n 


tx  -*■  “  1  t=0 


w .  p  .  1  Pc 


Result  (a)  can  be  proved  by  the  law  of  large  numbers  because 

nQ(t)  is  assumed  to  be  a  zero-mean  white  Gaussian  process  with 

a  covariance  matrix  R0.  To  prove  (b),  we  first  recognize  that 
k  <p  i  A 

£,<)  Ho  ( t  )R“a  (  h0-h (  x(  t  ;_8, R )  ,  t )  is  a  martingale  sequence 
because  jc(t;J,R)  is  orthogonal  to  the  zero-mean  white  process 
nQ(t).  By  Hypotheses  (i),  (2.3)  and  the  discrete  version  of 
the  Khazminskii  lemma  112]  in  113]  ,  the  claim  (b)  can  be  proved 
By  (a),  (b),  and  Hypothesis  (ii),  we  complete  the  proof  of  the 
theorem . 

The  insight  of  Hypothesis  (ii)  in  Theorem  3.6  is 
enlightened  when  we  restrict  h  to  be  linear  with  constant 
coefficients.  In  this  case,  Hypothesis  (ii)  becomes 


V1 

lim  i_  E 

oo  t==0 


(  Hx(t,_§,R)  I  2  -  E0  H  x(t,_0,R)  2  ]  =  0 


w.p.  1 


(3.26) 


where  H  is  a  constant  matrix  and 


x ( t ,  0, R )  =  4(  x0(  0  ,t )  )  -  x ( t ;  0,  R) 


If  the  filter  error  becomes  stationary  and  ergodic  then  certainly 
(3.26)  holds.  For  most  practical  applications  including 
nonlinear  state  estimation  problems,  we  find  that  residuals 
exhibit  stationary  sample  statistics  as  long  as  filtering 
divergence  does  not  occur. 
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4.  CONTINUOUS-TIME  UNCERTAIN  SYSTEMS 

Basically,  we  require  two  substitutions  in  order  to  extend 
the  concepts  introduced  in  Section  3  to  cover  the  continuous-time 
systems.  First,  we  replace  the  ratio  P/PG  in  the  definitions 
of  MKLI  and  GMKLI  by  the  Radon-Nikodym  derivative  (RND)  of  the 
probability  measures  induced  by  the  assumed  model  and  the  true 
stochastic  process.  Secondly,  we  replace  the  likelihood 
functions  in  the  definitions  of  MLE  and  GMLE  by  a  likelihood 
ratio  in  a  form  of  a  RND  with  respect  to  a  certain  reference 
measure.  These  two  substitutions  are  necessary  because  we  are 
dealing  with  an  uncountable  sample  space  of  a  continuous-time 
stochastic  process. 

For  example,  if  we  model  the  continuous -time  system  by  a 
diffusion  process  of  an  Ito  differential  equation  [15],  then  the 
reference  measure  can  be  chosen  as  the  Wiener  measure  defined 
over  the  space  of  continuous  functions  [16],  Furthermore,  the 
RND  of  two  Ito  differential  equations  is  well  studied  in  the 
literature,  e.g. ,  [15]  and  [17].  After  the  appropriate 
substitutions  are  carried  out,  the  analysis  procedure  introduced 
in  Section  3  is  directly  applicable.  Here,  we  only  present  two 
remarks  that  have  been  overlooked  by  researchers  in  this  area, 
e.g.,  in  [18]. 

To  introduce  these  two  remarks,  we  first  look  at  the 
following  simple  example.  Suppose  that  we  use  the  model  given  by 


dz(t)  =  a  x(t)  dt  +  a  dB(t) 


(4.1) 
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to  represent  the  observed  scalar  diffusion  process  z(t),  where 
B(t)  is  the  standard  Brownian  motion.  However,  the  true 
representation  of  z(t)  is  given  by 


dz(t)  =  a0x0(t)  dt  +  0q  dB ( t ) 


(4.2) 


There  are  two  problems  if  we  want  to  use  the  maximum  likelihood 
principle  to  estimate  a  and  a  based  on  (4.1)  and  the  observed 
sample.  The  first  problem  arises  because  the  induced  measure  u 
and  uQ  of  (4.1)  and  (4.2)  respectively  are  singular  to  each 
other  if  a  *  o0.  This  fact  can  be  proved  by  the  result 
reported  in  (191.  The  second  problem  is  explained  as  follows. 
Suppose,  o  =  oQ,  and  suppose  that  we  use  the  formulae  provided 
in  [20]  and  [21]  for  the  RND  directly.  We  should  have 


tn(RNDl) 


[  /  (ax-a  x  )dz  -  ~r  /  (a2x2-a2x2)dt] 

0  0  0  2  0 


(4.3) 


Considering  Eq.  4.3  as  a  function  of  a  and  o,  it  is  obvious  that 
the  function  is  not  unimodal  in  a  and  o.  We  arrive  at  the 
following  two  observations  regarding  the  above  example. 


Remark  1: 


The  representation  of  z(t)  can  be  given  by 


u  :  same  as  (4.1)  but  z(0)  -  N(0,o2) 
u0:  dz ( t )  =  a0x0{ t )dt  +  adB(t)  ;  (4.4) 

z(0)  ~  N(0,  oQ  ) 

Remark  2:  The  RtJD  of  (4.4)  is  given  by 

»  *n(RNDl)  -  *n(£_)  +~  z2(0)  .[  oo“2-o”2] 
du  a  2 

o  o 

(4.5) 

The  above  two  remarks  are  the  direct  consequence  of  Theorem  5.3 
of  (15) .  It  is  also  clear  that  (4.5)  is  an  unimodal  function 
of  a  and  o.  Inspired  by  Akaike's  original  idea,  we  appreciate 
the  logarithmic  term. 

5.  CONCLUSION 

We  follow  Akaike's  original  idea  to  exploit  the  connection 
between  the  mean  Kullback-Leibler 's  information  and  the  maximum 
likelihood  principle  to  cover  the  estimation  problem  of 
non-linear  systems  with  significant  model  uncertainties.  We 
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introduce  the  concept  of  the  generalized  mean  Kullback-Leibler '  s 
information  and  establish  its  relationship  with  the  generalized 
maximum  likelihood  principle.  The  results  of  this  paper  have 
been  applied  to  the  trajectory  estimation  problem.  Finally,  we 
present  two  remarks  concerning  the  extension  of  the  earlier  part 
of  this  paper  to  the  diffusion  process  generated  by  the  Ito 
differential  equation. 
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